Picture for Michaël E. Sander

Michaël E. Sander

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

Add code
May 12, 2026
Viaarxiv icon

Differentiable Knapsack and Top-k Operators via Dynamic Programming

Add code
Jan 29, 2026
Viaarxiv icon

Clustering in Deep Stochastic Transformers

Add code
Jan 29, 2026
Viaarxiv icon